Introduction to Information 

Theory 


Information Theory 


► Information theory is concerned with the 
fundamental limits of communication. 


► What is the ultimate limit to data compression? 
e.g. how many bits are required to represent 
source output. 


► What is the ultimate limit of reliable 
communication over a noisy channel, e.g. how 
many bits can be sent in one second over 
certain channel. 
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Information Theory 


► Information theory addresses and answers the two 
fundamental questions of communication theory: 


1. What is the ultimate data compression? 

(answer: the entropy of the data, H, is its compression 
limit) 

2. What is the ultimate transmission rate of 
communication? 

(answer: the channel capacity, C, is its rate limit.) 



Information Theory 

► How should information be measured? 

► How much additional information is gained by 
some reduction in uncertainty? 

► How do the a priori probabilities of possible 
messages determine the information? 

► What is the information content of a random 
variable? 

► How does the noise level in a communication 
channel limit its capacity to transmit information? 

► How does the bandwidth (in cycles/second) of a 
communication channel limit its capacity to 
transmit information? 


Communication System 





























Coding theory 


► Coding theory is concerned with practical 
techniques to realize the limits specified by 
information theory. 

► Source coding converts source output to bits. 

► Channel coding adds extra bits to data 
transmitted over the channel. 

► This redundancy helps combat the errors 
introduced in transmitted bits due to channel 


noise. 


Source Coding 


► Based on characteristics/features of a source, Source 
Encoder-Decoder pair is designate to reduce the source 
output to a Minimal representation. 

► How to model a signal source? ^Information, Entropy 

► How to measure the content of a source? 

► How to represent a source? Code-design 

► How to model the behavior of a channel? 

(channel capacity) 

► Redundancy Reduction -^Data Compression 


Channel coding 

► Introduction redundancy into the channel encoder and 
using this redundancy at the decoder to reconstitute 
the input sequences as accurately as possible, i.e., 
channel coding is designate to minimize the effect of 
the channel noise. 


Noise 



n 




Noiseless 



Channel 
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What is Information? 


► A quantitative measure of information contained in an 
event, this measure should have some properties : 

► 1 .Information contained in events ought to be defined 
in terms of some measure of the uncertainty of the 
events. 

► 2. Less certain events ought to contain more 
information than more certain events. 

► 3. The information of unrelated/independent events 
taken as a single event should equal the sum of the 
information of the unrelated events. 
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Information Properties 

► Information measure l(p) to have several properties: 

1. Information is a non-negative quantity: l(p)>0. 

2. If an event has probability 1, we get no information 
from the occurrence of the event: 1(1) = 0. 

3. If two independent events occur (whose joint 
probability is the product of their individual 
probabilities), then the information we get from 
observing the events is the sum of the two information 
Kpland p2) = I(pl)+I(p2): 


Information Source 


S 1 S 2 • • • S q : Source alphabet 

P, P 2 ... p q : Probability 


1. The information content is somewhat inversely related to the probability 
of occurrence. 

2. The information content from two different independent symbols is the 
sum of the information content from each separately. 

3. Since the probability of two independent choices are multiplied together 
to get the probability of the compound event, it is natural to define the 
amount of information as 


/($,) = log 


1 


P. 



As a result, we have 

i(s l )+i(s 2 )= log — 

pa 


(or - log Pj ) 

i{s t ,s 2 ) 





Information Measure 


A nature measure of the uncertainty of an event o< is the 
probability of a denoted P(a). 

The information of an event cx in terms of P(o(), will be 
satisfied if the information in o< is defined as 

I (a) = —log P(a) 

Self-information 

The base of the logarithm depends on the unit of 
information to be used. 
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Information Measure 


Information Unit: 


log 2 : bit 


log e : nat 


log 10 : Hartley 


base conversions: 


log 10 2 = 0.30103, 

log 2 10 = 3.3219 

log 10 e = 0.43429, 

log e 10 = 2.30259 

log e 2 =0.69315, 

log 2 e = 1.44270 

log. x = 1 | 0o,, X = 

(log fl *)log 6 X 

log b a 






Information Properties 

1. 7(p 2 ) = I( p * p ) = I(p ) + /(p) = 2 * 7(p) 

2. Thus, further, I(p n ) = n* 7(p) 

(by induction . ..) 

3. I(p ) = /((p 1 /™)™) = m * /(p 1 /™), so 

/(p 1 /™) = * 7(P) and thus in general 

I f Is - 

^(p n / m ) =—* I (p) 

m 

4. Arid thus, by continuity, we get, for 
0 < p < 1, and o > 0 a real number: 


/(p G ) = a * /(p) 



Average Information 


► Entropy: Average information content over the 
whole alphabet symbols. 


ff r (s)=Z^ lo 8r(i) 

i =1 

= ~T i P i lQ Sr P i 

i= 1 

ff r (s)=B 2 (sYlog r 2) 
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Average Information 


The entropy function involves only the distribution of 
the probabilities (frequency of occurrence ) - it is a 
function of a probability Distribution Pi and does not 
involve the Si 

X = {Rain, fine, cloudy, snow} = {R, f, c, s} 

P(R) = 14, P(F) = >4 P(C) = Vi, P(S) = 0 
H 2 (X) = 1.5 bits/symbol 


If 14 for each P(i) H2 ( X ) = 2 bits/symbol. (>1.5) 

{equal probability event) 2 


H(X) = 0 for a certain event 



P(ai)=0 

or 

P(ai)=1 


Average Information 

A discrete random variable X takes its values 
{0, 1}. The probability distribution is given by 

P{X = l}= p = 1 - p{x = o} 

Calculate the entropy of X. 


H(x)= -plog, p - (1 - p)log 2 (l - p)= H 2 (p) 


Sketching H^{p) versus p gives the following graph 



From the graph, we observe the following: 

the maximum is achieved when p = —: 

2 

h 2 (p)= 0 when p = 0 or p = 1 (there is no uncertainty) 







0 < H(P ) < log(rc) 


We have H{P) — 0 when exactly one of 
the pi’s is one and all the rest are zero. 

We have H{P) — log(r 2 ) only when all of 
the events have the same probability ^ 

That is, the maximum of the entropy 
function is the log() of the number of 
possible events, and occurs when all the 
events are equally likely. 



